首页> 外文OA文献 >A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization
【2h】

A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization

机译:一种非负矩阵的高性能并行算法   因式分解

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Non-negative matrix factorization (NMF) is the problem of determining twonon-negative low rank factors $W$ and $H$, for the given input matrix $A$, suchthat $A \approx W H$. NMF is a useful tool for many applications in differentdomains such as topic modeling in text mining, background separation in videoanalysis, and community detection in social networks. Despite its popularity inthe data mining community, there is a lack of efficient parallel software tosolve the problem for big datasets. Existing distributed-memory algorithms arelimited in terms of performance and applicability, as they are implementedusing Hadoop and are designed only for sparse matrices. We propose a distributed-memory parallel algorithm that computes thefactorization by iteratively solving alternating non-negative least squares(NLS) subproblems for $W$ and $H$. To our knowledge, our algorithm is the firsthigh-performance parallel algorithm for NMF. It maintains the data and factormatrices in memory (distributed across processors), uses MPI for interprocessorcommunication, and, in the dense case, provably minimizes communication costs(under mild assumptions). As opposed to previous implementations, our algorithmis also flexible: (1) it performs well for dense and sparse matrices, and (2)it allows the user to choose from among multiple algorithms for solving localNLS subproblems within the alternating iterations. We demonstrate thescalability of our algorithm and compare it with baseline implementations,showing significant performance improvements.
机译:非负矩阵分解(NMF)是确定给定输入矩阵$ A $的两个非负低秩因子$ W $和$ H $的问题,从而使$ A \ W W $。 NMF是在不同领域中的许多应用程序的有用工具,例如文本挖掘中的主题建模,视频分析中的背景分离以及社交网络中的社区检测。尽管它在数据挖掘社区中很受欢迎,但是仍然缺少有效的并行软件来解决大型数据集的问题。现有的分布式内存算法在性能和适用性方面受到限制,因为它们是使用Hadoop实现的,并且仅适用于稀疏矩阵。我们提出了一种分布式内存并行算法,该算法通过迭代求解$ W $和$ H $的交替非负最小二乘(NLS)子问题来计算因式分解。据我们所知,我们的算法是NMF的第一个高性能并行算法。它将数据和因子矩阵保存在内存中(分布在处理器之间),使用MPI进行处理器间通信,在密集情况下,可证明地最小化了通信成本(在温和的假设下)。与以前的实现相反,我们的算法也很灵活:(1)它对于密集和稀疏矩阵表现良好,(2)它允许用户从多种算法中进行选择,以解决交替迭代中的localNLS子问题。我们演示了算法的可扩展性,并将其与基线实现进行了比较,显示出显着的性能改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号